Using PageRank to Characterize Web Structure

نویسندگان

  • Gopal Pandurangan
  • Prabhakar Raghavan
  • Eli Upfal
چکیده

Recent work on modeling the Web graph has dwelt on capturing the degree distributions observed on the Web. Pointing out that this represents a heavy reliance on “local” properties of the Web graph, we study the distribution of PageRank values (used in the Google search engine) on the Web. This distribution is of independent interest in optimizing search indices and storage. We show that PageRank values on the Web follow a power law. We then develop detailed models for the Web graph that explain this observation, and moreover remain faithful to previously studied degree distributions. We analyze these models, and compare the analyses to both snapshots from the Web and to graphs generated by simulations on the new models. To our knowledge this represents the first modeling of the Web that goes beyond fitting degree distributions on the Web.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification

In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...

متن کامل

Using Spam Farm to Boost PageRank

Today people have become more and more dependent on search engines such as Google, Yahoo, and MSN, etc., for their information needs. Web spamming has emerged to take the economic advantage of high search rankings and threatened the accuracy and fairness of those rankings. Understanding spamming techniques is essential for evaluating the strength and weakness of a ranking algorithm, and for fig...

متن کامل

Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore

Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...

متن کامل

Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search

The original PageRank algorithm for improving the ranking of search-query results computes a single vector, using the link structure of the Web, to capture the relative “importance” of Web pages, independent of any particular search query. To yield more accurate search results, we propose computing a set of PageRank vectors, biased using a set of representative topics, to capture more accuratel...

متن کامل

Verifying Nash Equilibria in PageRank Games on Undirected Web Graphs

J. Hopcroft and D. Sheldon originally introduced the PageRank game to investigate the self-interested behavior of web authors who want to boost their PageRank by using game theoretical approaches. The PageRank game is a multiplayer game where players are the nodes in a directed web graph and they place their outlinks to maximize their PageRank value. They give best response strategies for each ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Internet Mathematics

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2002